Recent Trends in Digital Text Forensics and Its Evaluation - Plagiarism Detection, Author Identification, and Author Profiling
نویسندگان
چکیده
This paper outlines the concepts and achievements of our evaluation lab on digital text forensics, PAN 13, which called for original research and development on plagiarism detection, author identification, and author profiling. We present a standardized evaluation framework for each of the three tasks and discuss the evaluation results of the altogether 58 submitted contributions. For the first time, instead of accepting the output of software runs, we collected the softwares themselves and run them on a computer cluster at our site. As evaluation and experimentation platform we use TIRA, which is being developed at the Webis Group in Weimar. TIRA can handle large-scale software submissions by means of virtualization, sandboxed execution, tailored unit testing, and staged submission. In addition to the achieved evaluation results, a major achievement of our lab is that we now have the largest collection of state-of-the-art approaches with regard to the mentioned tasks for further analysis at our disposal.
منابع مشابه
Overview of the PAN/CLEF 2015 Evaluation Lab
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...
متن کاملOverview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre persp...
متن کاملEnsemble Learning Approach for Author Profiling
With the evolution of internet, author profiling has become a topic of great interest in the field of forensics, security, marketing, plagiarism detection etc. However the task of identifying the characteristics of the author just based on a text document has its own limitations and challenges. This paper reports on the design, techniques and learning models we adopted for the PAN-2014 Author P...
متن کاملA Survey on Authorship Profiling Techniques
Authorship analysis is a text analysis technique that is visualized mainly in three different techniques namely Authorship Profiling, Authorship Identification and Plagiarism Detection. In this paper a brief survey on the recent developments in the area of author profiling approaches were presented. Authorship Profiling is to ascertain various authors characteristics like age, gender, native co...
متن کاملA Systematic Review on Author Identification Methods
Author Identification is a technique for identifying author of anonymous text. It has near about 130 year’s long history, started with the work by Mendenhall 1987. Applications of Author identification include plagiarism detection, detecting anonymous author, in forensics and so on. In this paper the authors outline features used for Author identification like vocabulary, syntactic and others. ...
متن کامل